Add a stop-the-world, serial Compressor #1340

no-defun-allowed · 2025-07-15T05:25:14Z

This PR adds a Compressor-esque garbage collector, which is based on MarkCompact but uses a mark bitmap for forwarding metadata like the Compressor.

no-defun-allowed · 2025-07-15T05:57:24Z

The current failure to compile mock_test_allocator_info is a one-line fix, but I don't think the mock VM tests are going to work, as the mock VM uses header metadata and the Compressor requires mark bits to be in a side bitmap.

src/plan/compressor/global.rs

src/plan/compressor/mutator.rs

src/plan/compressor/global.rs

src/policy/compressor/compressorspace.rs

src/policy/compressor/forwarding.rs

k-sareen · 2025-07-15T06:07:38Z

Thanks for the PR @no-defun-allowed! The major thing I see is that currently the Compressor plan is not using the LOS. Was this intentional or accidental?

no-defun-allowed · 2025-07-15T06:23:53Z

Thanks for the PR @no-defun-allowed! The major thing I see is that currently the Compressor plan is not using the LOS. Was this intentional or accidental?

Definitely accidental. Thanks for the review.

qinsoon · 2025-07-16T01:22:22Z

The current failure to compile mock_test_allocator_info is a one-line fix, but I don't think the mock VM tests are going to work, as the mock VM uses header metadata and the Compressor requires mark bits to be in a side bitmap.

You can cherry pick this commit to the PR: qinsoon@ac90d4b.

It runs mock VM with side metadata. Compressor currently fails for tests that are not using contiguous space.

no-defun-allowed · 2025-07-16T04:39:34Z

You can cherry pick this commit to the PR: qinsoon@ac90d4b.

Thanks!

It runs mock VM with side metadata. Compressor currently fails for tests that are not using contiguous space.

Right. Should we skip such tests when running with the Compressor?

src/policy/compressor/forwarding.rs

src/policy/compressor/compressorspace.rs

qinsoon · 2025-07-16T04:52:08Z

It runs mock VM with side metadata. Compressor currently fails for tests that are not using contiguous space.

Right. Should we skip such tests when running with the Compressor?

Is there any intrinsic reason why the compressor cannot run with discontiguous spaces? I feel there is none. It would be better to allow running Compressor with discontiguous spaces.

At the same time, there might be practical reasons that this might be not easy, or it requires too much efforts. It is also okay to skip those tests.

src/plan/compressor/mutator.rs

no-defun-allowed · 2025-07-16T05:35:02Z

Is there any intrinsic reason why the compressor cannot run with discontiguous spaces? I feel there is none. It would be better to allow running Compressor with discontiguous spaces.

I feel that way too. The judgement was more about how closely we want to follow the Compressor paper - one simple approach to support discontiguous spaces would be to compress [sic] each chunk of a space, which would also give a simple way to parallelise (which is not the way in the paper).

The block offset vector is currently a Rust Vec<AtomicUsize> which I do the address-to-block calculation assuming a contiguous heap; I should be using some other metadata mechanism for that anyway.

k-sareen · 2025-07-17T01:30:03Z

The block offset vector is currently a Rust Vec which I do the address-to-block calculation assuming a contiguous heap; I should be using some other metadata mechanism for that anyway.

For what its worth, ART also uses a uint32_t* for the offset vector calculation.

This reverts commit ac90d4b.

no-defun-allowed · 2025-07-18T04:20:35Z

Now the block offset vector and the marks are in their own local metadata. I'm still undecided on if and how to support discontiguous spaces, since the forwarding algorithm in the Compressor paper seems very tied to having a contiguous space.

qinsoon · 2025-07-18T04:21:09Z

The failures for 32 bits was caused by this: #1342. However, you reverted the side metadata commit for mock VM, so the bug won't appear in the PR any way.

qinsoon · 2025-07-18T04:21:44Z

Now the block offset vector and the marks are in their own local metadata. I'm still undecided on if and how to support discontiguous spaces, since the forwarding algorithm in the Compressor paper seems very tied to having a contiguous space.

If that's the case, just skip the tests that use discontiguous space for Compressor.

src/util/metadata/side_metadata/spec_defs.rs

no-defun-allowed · 2025-07-18T06:11:45Z

I added a MMTK_PLAN=discontiguous set to the CI script, for tests which require discontiguous heaps. The script also removes the Compressor from all on i686, since i686 defaults to a discontiguous heap too; some tests like test_vm_layout_default should still be tested on x86_64 but will panic on i686.

wks · 2025-07-22T02:10:37Z

I'm happy to approve this PR unless @wks you have some concerns?

Sorry about the delay. I'll review this PR now.

k-sareen · 2025-07-22T02:18:12Z

Forgot to disable compressed oops in the tests. Fixed now hopefully.

wks · 2025-07-22T03:28:17Z

A high-level question: Will this algorithm work if some objects have stricter alignment requirements than others? For example, object A is 8-byte aligned, and object B is 16-byte aligned, but A is 40 bytes in size, not a multiple of 16 bytes. They are laid out like this. A and B are bytes in object A and B, respectively. . is a gap byte between objects. m is the mark bit.

0        8        16       24       32       40       48       56       64       
AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA AAAAAAAA ........ BBBBBBBB BBBBBBBB
m                                   m                 m        m

The algorithm described in forwarding::Transducer will think there are 40 live bytes before object B. But since B requires 16-byte alignment, it cannot be moved to offset 40. It must be moved to offset 48 instead.

FYI, the existing mark compact space adds padding bits for the alignment when calculating forwarding addresses.

// MarkCompactSpace::calculating_forwarding_pointer
            for obj in self
                .linear_scan_objects(from_start..from_end)
                .filter(Self::to_be_compacted)
            {
                let copied_size =
                    VM::VMObjectModel::get_size_when_copied(obj) + Self::HEADER_RESERVED_IN_BYTES;
                let align = VM::VMObjectModel::get_align_when_copied(obj);
                let offset = VM::VMObjectModel::get_align_offset_when_copied(obj);
                // move to_cursor to aliged start address
                to_cursor = align_allocation_no_fill::<VM>(to_cursor, align, offset);
                // move to next to-block if there is no sufficient memory in current region

If Compressor cannot handle this case and it is hard to fix, we can temporarily check VMBinding::MAX_ALIGNMENT when initializing the Compressor plan and panic if we cannot support that VM.

no-defun-allowed · 2025-07-22T03:51:44Z

My code doesn't check alignment at all, and I'm not sure if there is a good way to implement variable alignment; we don't get much freedom in computing forwarding pointers from the mark bitmap. (The same applies to supporting discontiguous heaps with one Compressor setup/without regions.)

Is the size of an object always a multiple of its alignment? If not, I probably should take the alignment into account when determining where the last word of an object is.

src/policy/compressor/forwarding.rs

wks

In MMTk, the starting address of an object is not necessarily the raw address of the ObjectReference. See the doc comment of ObjectReference in src/util/address.rs. Currently, JikesRVM may add a hash code in front of an object to implement address-based hashing. CRuby currently prepends a word to store the object size (which is a workaround and is supposed to be removed in the future).

But some VMs guarantee that the starting address is always the ObjectReference's raw address. They will set ObjectModel::UNIFIED_OBJECT_REFERENCE_ADDRESS to true, and it will simplify some algorithms inside mmtk-core. Currently the OpenJDK guarantees this.

So you may check if UNIFIED_OBJECT_REFERENCE_ADDRESS is true and reject unsupported bindings when using Compressor.

You can also try to fix the code to support VMs where ObjectReference is not necessarily the starting address. That may involve setting the mark bits at the starting address instead of at ObjectReference, and scanning the VO bits in addition to the mark bits to find the ObjectReference when copying.

wks · 2025-07-22T03:56:38Z

src/policy/compressor/compressorspace.rs

+
+    pub fn test_and_mark(object: ObjectReference) -> bool {
+        let old = forwarding::MARK_SPEC.fetch_or_atomic(
+            object.to_raw_address(),


Here you set the mark bit on the raw address. But the raw address may not be the starting address of an object (unless UNIFIED_OBJECT_REFERENCE_ADDRESS is true).

wks · 2025-07-22T04:04:38Z

src/policy/compressor/forwarding.rs

+        let mut in_object = false;
+        MARK_SPEC.scan_non_zero_values::<u8>(start, end, &mut |addr: Address| {
+            if !in_object {
+                let object = ObjectReference::from_raw_address(addr).unwrap();


Note that the starting address may not be the raw address of ObjectReference (unless UNIFIED_OBJECT_REFERENCE_ADDRESS is true). If the mark bits are set on the starting and the ending words of an object, we will not be able to obtain ObjectReference from the starting address alone. But if we also use the VO bits, we can scan the VO bits to identify the ObjectReference.

Note that the existing MarkCompactSpace uses the VO bits as a local metadata even when we don't enable global VO bits.

Also note that we have ObjectModel::get_reference_when_copied_to(from: ObjectReference, to: Address) -> ObjectReference;. It doesn't work here. It needs a from reference which points to the object before copying in order to get the type information and compute the distance between the starting address and the ObjectReference.

wks · 2025-07-22T04:29:01Z

You can also try to fix the code to support VMs where ObjectReference is not necessarily the starting address. That may involve setting the mark bits at the starting address instead of at ObjectReference, and scanning the VO bits in addition to the mark bits to find the ObjectReference when copying.

Well, it may be hard to do the testing. JikesRVM is legacy software and needs special building environments to test. Neither CRuby nor Julia can use mark-compact because they need object pinning which mark-compact cannot support. Maybe we just support VMs where UNIFIED_OBJECT_REFERENCE_ADDRESS == true, and leave JikesRVM to another day.

no-defun-allowed · 2025-07-22T04:47:20Z

"If the mark bits are set on the starting and the ending words of an object, we will not be able to obtain ObjectReference from the starting address alone" sounds conclusive enough. I'll assert UNIFIED_OBJECT_REFERENCE_ADDRESS for now.

k-sareen · 2025-07-22T04:48:03Z

JikesRVM won't run currently anyway since this implementation of the Compressor requires contiguous spaces. When we have discontiguous space support, then maybe we can look at JikesRVM support.

no-defun-allowed · 2025-07-22T05:01:08Z

I'll assert UNIFIED_OBJECT_REFERENCE_ADDRESS for now.

But then I can't run mock VM tests with the Compressor, and adding const UNIFIED_OBJECT_REFERENCE_ADDRESS: bool = true; to MockVM produces this test failure:

---- vm::tests::mock_tests::mock_test_internal_ptr_normal_object::interior_pointer_in_normal_object stdout ----
start = 0x20000000000, end = 0x20000000010, obj = 0x20000000008
thread 'vm::tests::mock_tests::mock_test_internal_ptr_normal_object::interior_pointer_in_normal_object' panicked at src/util/address.rs:627:9:
The binding claims unified object reference address, but for object reference 0x20000000008, ref_to_object_start() returns 0x20000000000

wks · 2025-07-22T05:24:33Z

I'll assert UNIFIED_OBJECT_REFERENCE_ADDRESS for now.

But then I can't run mock VM tests with the Compressor, and adding const UNIFIED_OBJECT_REFERENCE_ADDRESS: bool = true; to MockVM produces this test failure:
---- vm::tests::mock_tests::mock_test_internal_ptr_normal_object::interior_pointer_in_normal_object stdout ----
start = 0x20000000000, end = 0x20000000010, obj = 0x20000000008
thread 'vm::tests::mock_tests::mock_test_internal_ptr_normal_object::interior_pointer_in_normal_object' panicked at src/util/address.rs:627:9:
The binding claims unified object reference address, but for object reference 0x20000000008, ref_to_object_start() returns 0x20000000000

Oops. There are test cases for the cases where the starting address is not equal to the ObjectReference. (IIRC I added some similar tests, too, but they seem to have gone.)

Maybe we can use Cargo features to conditionally enable compressor-related tests and set UNIFIED_OBJECT_REFERENCE_ADDRESS to true, and conditionally disable those tests that conflict with those options.

k-sareen · 2025-07-22T05:36:57Z

Maybe this can be a warn!() about if UNIFIED_OBJECT_REFERENCE_ADDRESS is not true. Might let us side-step this issue in the short-term.

qinsoon · 2025-07-22T05:43:55Z

So currently the compressor has the following requirements/limitations:

Requires contiguous space
Requires object references to be the same as the object start addresses
Objects should not grow size during copying.
Ignores object alignment (assumes objects are aligned to word-size?)

I feel it is worth discussing over these points in a meeting, and see what are intrinsic to the algorithm, and what are introduced due to the implementation.

It is not necessary to address the above issues before this PR can be merged, but it is good for us to understand the issues.

qinsoon · 2025-07-23T01:48:33Z

The following is my understanding about the limitations:

EDIT: I realized that an annoying thing is that during forwarding, you can no longer access object metadata. Calculating forwarding address can only be done with pre-computed side metadata. That means if we access any object metadata during the pre-computing (such as copied size, alignment, etc), those metadata needs to be recorded somehow in the side metadata so that during forwarding, we can do the same calculation again using side metadata alone.

* Requires contiguous space

Supporting discontiguous space seems doable, but it complicates things, especially the forwarding part.

First, encoding should be fine with discontiguous space, as the offset vector itself does not include any address. During computing offset vector, the current code uses the next block's start address to calculate the remaining live bytes in the previous block -- this assumes contiguous spaces. We could visit the block end address once to calculate the remaining live bytes. This should be easy to fix.

mmtk-core/src/policy/compressor/forwarding.rs

Line 61 in 5845836

self.live + (current_position - self.last_bit_visited) + 1

Then, forwarding. The forwarding address is computed as below:

mmtk-core/src/policy/compressor/forwarding.rs

Line 174 in 5845836

self.first_address + state.live

To support discontiguous space, we cannot use the start of the space, and we need some extra information (e.g. a table that maps block index to block start) to know the current block start, then apply the offset internal to a block to it -- state.live % block_size.

* Requires object references to be the same as the object start addresses

This should be purely an implementation limitation. In theory, the implementation could always use object.to_object_start() and never use object.to_raw_address(). And when it needs to return an object reference from a forwarded address, use ObjectModel::get_reference_when_copied_to.

* Ignores object alignment (assumes objects are aligned to word-size?)

It is also possible to properly support object alignment.

When we encounter an object during computing offset vector, we can compute the required alignment in the same way as mark compact. The diff between the aligned to-cursor and the previous to-cursor is the alignment padding that can be added to the live bytes prior to the object.

mmtk-core/src/policy/markcompactspace.rs

Lines 377 to 380 in 31a78a4

    
           let align = VM::VMObjectModel::get_align_when_copied(obj); 
        
           let offset = VM::VMObjectModel::get_align_offset_when_copied(obj); 
        
           // move to_cursor to aliged start address 
        
           to_cursor = align_allocation_no_fill::<VM>(to_cursor, align, offset);

* Objects should not grow size during copying.

if we calculate object size through the mark bit, it will be the original size before copying. But we can query ObjectModel::get_size_when_copied, and apply the diff to the size when we calculating live bytes for the offset vector.

src/plan/compressor/global.rs

no-defun-allowed · 2025-07-23T03:38:32Z

When we encounter an object during computing offset vector

apply the diff to the size when we calculating live bytes for the offset vector

The issue is that we don't only use the offset vector - we also scan the mark bitmap between the start of the block containing an object and that object. So we need to be able to encode the space taken in the mark bitmap. I think this is possible for alignment/padding, as there should be enough space in the mark bitmap to adjust the first/last word marks, but there might not be with objects which grow when copied.

no-defun-allowed · 2025-07-23T03:41:57Z

To support discontiguous space, we cannot use the start of the space, and we need some extra information (e.g. a table that maps block index to block start) to know the current block start, then apply the offset internal to a block to it -- state.live % block_size.

The forwarding loop doesn't account for any chunk boundaries either. I'm undecided on if preventing objects from crossing boundaries can be done with the mark bitmap or not.

Presently I plan to side-step the issues with the forwarding calculation by compacting regions of the heap individually, with regions smaller than or equal in size to the contiguous chunks of a discontiguous heap.

k-sareen · 2025-07-24T02:21:57Z

@qinsoon Should we skip the Compressor for MockVM tests for the time-being then?

qinsoon · 2025-07-24T03:26:16Z

@qinsoon Should we skip the Compressor for MockVM tests for the time-being then?

Sounds good to me. Just create an issue on how we can allow MockVM to run tests for Compressor (I understand MockVM tests has its own limitation), or skip the issue if there is any plan to fix it in the near future.

k-sareen · 2025-07-24T07:23:10Z

@no-defun-allowed Can you just temporarily set ALL_PLANS to ALL_DISCONTIGUOUS_PLANS in the test script and add a comment there? That should work I think. Then we can merge this PR.

no-defun-allowed · 2025-07-24T08:40:34Z

@no-defun-allowed Can you just temporarily set ALL_PLANS to ALL_DISCONTIGUOUS_PLANS in the test script and add a comment there? That should work I think. Then we can merge this PR.

Done.

qinsoon

LGTM

no-defun-allowed added 5 commits July 15, 2025 15:23

Add a vaguely Compressor-esque GC

1ddbb5e

Clean up some

b80ba55

rustfmt

cd136b1

clippy fix

43b094a

cargo fmt

4e02f12

k-sareen self-requested a review July 15, 2025 05:37

k-sareen reviewed Jul 15, 2025

View reviewed changes

Review comments

10e5439

Capitalise sentences

e568453

qinsoon and others added 2 commits July 16, 2025 03:43

Allow running MockVM with side metadata

b30d542

Add a feature to disable LOS and immortal space for Compressor

237fbbf

qinsoon reviewed Jul 16, 2025

View reviewed changes

src/policy/compressor/forwarding.rs Outdated Show resolved Hide resolved

src/policy/compressor/compressorspace.rs Outdated Show resolved Hide resolved

k-sareen reviewed Jul 16, 2025

View reviewed changes

src/plan/compressor/mutator.rs Outdated Show resolved Hide resolved

no-defun-allowed added 3 commits July 18, 2025 03:28

Use MMTk metadata for the offset vector and a separate mark bitmap

efcf713

Revert "Allow running MockVM with side metadata"

def9e36

This reverts commit ac90d4b.

Fix some comments

4608807

k-sareen reviewed Jul 18, 2025

View reviewed changes

src/util/metadata/side_metadata/spec_defs.rs Outdated Show resolved Hide resolved

Skip MockVM tests with the Compressor on unsupported configurations

2470de7

qinsoon added the PR-extended-testing Run extended tests for the pull request label Jul 22, 2025

wks reviewed Jul 22, 2025

View reviewed changes

src/policy/compressor/forwarding.rs Outdated Show resolved Hide resolved

src/policy/compressor/forwarding.rs Outdated Show resolved Hide resolved

Use Region and RegionIterator for blocks

5845836

wks reviewed Jul 22, 2025

View reviewed changes

wks reviewed Jul 23, 2025

View reviewed changes

src/plan/compressor/global.rs Show resolved Hide resolved

no-defun-allowed added 2 commits July 23, 2025 06:25

Actually use the LOS

438f248

cargo fmt and clippy

e32d7a3

Skip mock VM tests for Compressor

ce4e1f4

qinsoon approved these changes Jul 24, 2025

View reviewed changes

k-sareen added this pull request to the merge queue Jul 24, 2025

Merged via the queue into mmtk:master with commit d93262b Jul 24, 2025
31 of 33 checks passed

Add a stop-the-world, serial Compressor #1340

Add a stop-the-world, serial Compressor #1340

Uh oh!

Conversation

no-defun-allowed commented Jul 15, 2025

Uh oh!

no-defun-allowed commented Jul 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

k-sareen commented Jul 15, 2025

Uh oh!

no-defun-allowed commented Jul 15, 2025

Uh oh!

qinsoon commented Jul 16, 2025

Uh oh!

no-defun-allowed commented Jul 16, 2025

Uh oh!

Uh oh!

Uh oh!

qinsoon commented Jul 16, 2025

Uh oh!

Uh oh!

no-defun-allowed commented Jul 16, 2025

Uh oh!

k-sareen commented Jul 17, 2025

Uh oh!

no-defun-allowed commented Jul 18, 2025

Uh oh!

qinsoon commented Jul 18, 2025

Uh oh!

qinsoon commented Jul 18, 2025

Uh oh!

Uh oh!

no-defun-allowed commented Jul 18, 2025

Uh oh!

wks commented Jul 22, 2025

Uh oh!

k-sareen commented Jul 22, 2025

Uh oh!

wks commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

no-defun-allowed commented Jul 22, 2025

Uh oh!

Uh oh!

Uh oh!

wks left a comment

Choose a reason for hiding this comment

Uh oh!

wks Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

wks Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

wks commented Jul 22, 2025

Uh oh!

no-defun-allowed commented Jul 22, 2025

Uh oh!

k-sareen commented Jul 22, 2025

Uh oh!

no-defun-allowed commented Jul 22, 2025

Uh oh!

wks commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k-sareen commented Jul 22, 2025

Uh oh!

qinsoon commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qinsoon commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

wks commented Jul 22, 2025 •

edited

Loading

wks commented Jul 22, 2025 •

edited

Loading

qinsoon commented Jul 22, 2025 •

edited

Loading

qinsoon commented Jul 23, 2025 •

edited

Loading

no-defun-allowed commented Jul 23, 2025 •

edited

Loading